Phoneme-to-viseme Mapping for Visual Speech Recognition

نویسندگان

  • Luca Cappelletta
  • Naomi Harte
چکیده

Phonemes are the standard modelling unit in HMM-based continuous speech recognition systems. Visemes are the equivalent unit in the visual domain, but there is less agreement on precisely what visemes are, or how many to model on the visual side in audio-visual speech recognition systems. This paper compares the use of 5 viseme maps in a continuous speech recognition task. The focus of the study is visual-only recognition to examine the choice of viseme map. All the maps are based on the phoneme-to-viseme approach, created either using a linguistic method or a data driven method. DCT, PCA and optical flow are used as the visual features. The best visual-only recognition on the VidTIMIT database is achieved using a linguistically motivated viseme set. These initial experiments demonstrate that the choice of visual unit requires more careful attention in audio-visual speech recognition system development.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

The use of visemes as atomic speech units in visual speech analysis and synthesis systems is well-established. Viseme labels are determined using a many-to-one phoneme-to-viseme mapping. However, due to the visual coarticulation effects, an accurate mapping from phonemes to visemes should define a many-to-many mapping scheme. In this research it was found that neither the use of standardized no...

متن کامل

Decoding visemes: improving machine lipreading (PhD thesis)

This thesis is about improving machine lip-reading, that is, the classification of speech from only visual cues of a speaker. Machine lip-reading is a niche research problem in both areas of speech processing and computer vision. Current challenges for machine lip-reading fall into two groups: the content of the video, such as the rate at which a person is speaking or; the parameters of the vid...

متن کامل

Primary research on the viseme system in Standard Chinese

The study of traditional phonetics indicates the shape of lips takes important effect on the articulations of consonants and vowels. [1]. AVSP (Audio-Visual Speech Processing) can improve the naturalness of synthetical speech and recognition rate of the speech recognition system. Especially in computer-synthesized face, the movements of lip-shape play a crucial role. The present research aims t...

متن کامل

Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading

Speech is the most common communication method between humans and involves the perception of both auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, but it has been demonstrated that video can provide information that is complementary to the audio. Thus, the study of automatic lip-reading is important and is still an open problem. One of the ke...

متن کامل

Phoneme - Viseme Mapping for German Video - Realistic Audio - Visual - Speech - Synthesis IKP - Working Paper NF 11

In this working paper we introduce a German viseme set which we already use in our data-driven audio-visual synthesis system. The viseme set is essential for speech driven audio-visual synthesis due to the fact that the selection of appropriate video segments is based on the visemically transcribed input text. For text-to-speech synthesis, a transcription of the input text into the phonemic rep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012